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Matrix Completion by the Principle of 

Parsimony 



Augusto Ferrante and Michele Pavon 



Abstract 



Dempster's covariance selection method is extended first to general nonsingular matrices and 
then to full rank rectangular matrices. Dempster observed that his completion solved a maximum 
entropy problem. We show that our generalized completions are also solutions of a suitable entropy-like 
variational problem. 

Index Terms 

Covariance selection, maximum entropy problem, matrix completion, parsimony principle. 



I. Dempster's COVARIANCE SELECTION 

In the seminal paper [fT4ll . Dempster introduced a general strategy for completing a partially 
specified covariance matrix. Consider a zero-mean, multivariate Gaussian distribution with den- 
sity 



Suppose that the elements {0",^; I < i < j < n, G X} have been specified. How should S 
be completed? Dempster resorts to a form of the Principle of Parsimony in parametric model 
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fitting: As the elements cr*-' of appear as natural parameters of the model, one should set cr*-' 
to zero for 1 < i < j < n,{i,j) ^ X. Notice that cr*-' = has the probabilistic interpretation that 
the i-th and j-th components of the Gaussian random vector are conditionally independent given 
the other components. This choice, which we name henceforth Dempster's Completion, may 
at first look less natural than setting the unspecified elements of S to zero. It has nevertheless 
considerable advantages compare to the latter, c.f. [[T4l p. 161]. In particular, Dempster established 
the following far reaching result. 

Theorem 1.1: Assume that a symmetric, positive-definite completion of S exists. Then there 
exists a unique Dempster's Completion S°. This completion maximizes the (differential) entropy 

f 11 

Hip) = - / log{p{x))p{x)dx = - log(det S) + -n{l + log(27r)) (1) 

among zero-mean Gaussian distributions having the prescribed elements {(Tij] I < i < j < 
n, e I}. 

Thus, Dempster's Completion S° solves a maximum entropy problem, i.e. maximizes entropy 
under linear constraints. Dempster's paper has generated a whole stream of research, see e.g 
in, flSU, [fT3ll . [|20ll and references therein. In the meantime, matrix completion has become an 
important area of research with several new applications, where the completed matrix must have 
certain prescribed properties: For instance, it should be positive definite, it should be circulant, it 
should have a Toeplitz structure, it should have a prescribed low rank, etc. Motivation originates 
from problems in texture images modeling, recommender systems and networked sensors [[BJ, 

m, m, m, m, im, m, ffm. 

In this paper, we consider a totally unstructured version of Dempster's problem. A square 
matrix S is partially specified and we seek to complete it according to the principle of parsimony. 
Besides the above mentioned applications, we are also motivated by the following problem. 
Suppose that we need to solve the linear system 

SX = B, (2) 

where the matrix B is given. Suppose that only the elements {aij G X} of S could be esti- 
mated/determined, where X is any subset of {1, 2, . . . n} x {1, 2, . . . n}. As a generic completion 
S is invertible, we can associate to such a completion the solution 

Xt = ^-'B. (3) 
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We then identify as desirable, according to the principle of parsimony, completions S such that 
has a maximum number of zero entries. In this paper, we show that a family of such desirable 
completions are generalized Dempster's completions that can be characterized as critical points 
of a suitable variational problem. 

It may, at first, look hopeless to obtain an entropy-like variational characterization of Dempster's- 



like completions without positivity. Nevertheless, we prove in Lemmata 3.1 and 3.2 below that 
the contrained extremization of the determinant only involves the positive part of the matrix 

1 /2 

(SS^) . More precisely, only the singular values of S come into play. Hence, such a variational 
characterization is possible and may be established even in the rectangular case. 

The paper is outlined as follows. We discuss first the square case to facilitate the comparison 
with Dempster's classical results, see Sections III and IV below. In Section |V| we discuss two 
examples to illustrate the properties that our solutions may or may not enjoy. In Section VI we 
generalize our results to rectangular, full rank matrices. The paper concludes with a discussion 
section comparing our approach to other matrix completion techniques and to other moment 
problems. 



II. The general completion problem 

Let X C {1, 2, ... n} x {1, 2, . . . n} and X be the complementary subset. To each E X 
we associate the unknown Xij. Let x be the vector, say A;-dimensional, obtained by stacking the 
Xij one on top of the other. We define as partial matrix a parametric family of matrices 
whose entries = (Xjj are specified for (z, j) G X, while = Xij for G X. 

Here, both cTjj and Xij take real values. A completion of the partial matrix is a matrix S(a;) 
where x G M''. Notice that completions always exist as we are not requiring to possess 
any further property. If is a partial matrix and X is the corresponding set of indices of the 
unspecified elements, we denote by X^ the set of indices X^ := {(j, : G X}. 

Let Tj{x) be a square partial matrix of size n and let X be the corresponding set of indices of 
the unspecified entries. Consider the following matrix completion problems: 

Problem 2.1: Find the nonsingular completions S(a;) such that = for all (z, j) G 

XT. 

Remark. Since in we have |X| degrees of freedom (unknowns), we may generically expect 
that the maximum number of entries that can be annihilated in is precisely |X|. Indeed, 
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m zeros in is equivalent to finding a solution to a system of m polynomial equations 

in |X| unknowns on the complement of the set where the determinant vanishes (in Section |v| 
we provide a non generic example where there is a completion with more than |X| zeros in the 
inverse). 

Problem 2.2: Find the nonsingular completions that extremize det 



Remark. Notice that when a co variance matrix S = S > is sought, Problem 2.2 reduces to 
the maximum entropy problem solved by Dempster's Completion. This follows from the fact that 
the entropy, in the Gaussian case, differs from (1/2) log det S by a constant, the monotonicity 
of the logarithm and strict concavity of the entropy. Thus, Problem |Z2 appears as a legitimate 
generalization of Dempster's classical completion method. 



The main result of this paper consists in showing that Problems 2.1 and 2.2 have the same 
set of solutions. 



Theorem 2.3: S(x) solves Problem 2.1 if and only if it solves Problem 2.2 



Remark. It is apparent that solutions to Problems 2.1 and 2.2 may not exist, but when they do 
there may be many. For instance, consider 



S(x) 




Then, Problems 2.1 and 2.2 are not solvable. Actually, whenever all the unknowns Xij are in the 



same row or in the same column, det (S(x)) is linear in Xij and hence it does not have critical 
points. Two examples where there exist multiple solutions are provided in Section IVl 



in. Some preliminary results 



We collect below some lemmata that are needed to prove Theorem 2.3 



Lemma 3.1: Problem 2.2 is equivalent to the following: Find a nonsingular completion S(a;) 



that extremizes J(S(x)) := log |det [S(x) 
Proof: Compute the gradient 



d 



[log I det [S( 



X] 



|det[S(x)]| d 



[det [E(x)]] 



1 



d 



[det[S(a;)]] (4) 



^^L-^i— v-yju |det[E(x)]| det[S(a;)] dx^^^"^ ' det [S(x)] ^a; ' 

Now the statement follows by observing that we are restricting attention to nonsingular comple- 
tions. □ 
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Denote by D[J{T,); 6T.] the directional derivative of J in direction 6T. e M"^": 

We have the following result. 

Lemma 3.2: Let J(S(x)) = log |det as in the previous lemma. If E is nonsingular 

then, for any 5S G M"''", 

D[J(S);5S] =tr p-i^S]. (6) 

Proof: Let 

P(S) := f/(S) := P(S)-iS. (7) 

Observe that E = P(S)f/(S) is the polar decomposition of S. Similarly, we have := S + 
= P(S£)[/(Se). Consider now the Taylor expansion 

P(S,) =P(S)+£5P + o(£2), (8) 
where 6P := Z)[P(S); In view of ([v]), the latter directional derivative may be expressed as 

6P = D[P{J:y, (5S] = P)[Ef/^(S); (5S] = (5Ef/^(S) + EP)[f/^(S); 5E]. (9) 
Notice now that 

log |det = log[det [P(S)]]. (10) 
Moreover, if Q = > 0, the following expression holds flU, [fTSl : 

D[log[detiQ)];5Q]=tT[Q-HQ]. (11) 
From ( [To] ), ( [TT] ) and (|9]), we now get: 



P'[J(S);5E] = tr [P(S)-^(5P] 

= tr [P(S)-i(5Sf/T(S)] + tr [P(E)-iSD[f/T(S); 
= tr +tr[f/(S)P)[[/^(S);(5S]]. (12) 

The result now follows by observing that tr [t/(S)P>[f/T(S); = 0. Indeed, 
= tr[P)[t/(S)f/"^(S);(5S]] 

= tr [t/(S)P)[f/^(S); + tr [P'[f/(S); 5S]f/T(S)] 

= 2tr[f/(S)P)[f/^(S);5S]]. (13) 
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□ 

Lemma 3.3: Consider the space M"^" endowed with the inner product (Mi, M2) := tr [Mj M2]. 
Let Ai be the subspace of M"^" consisting of the matrices whose entries in position E X 
are zero. Let M e M^. Then, [M],j = for all (i, j) G X. 

Proof: Denote by Cj the i-th canonical vector in M". Clearly, for any G X, CjcJ G A^. 
Thus, if M G M^, for any G X, we have = tr [{eicjy M] = tr [cj-e^M] = ejMej = 
[Mh. □ 



IV. Proof of Theorem 123J 
We are now ready to prove our main result. 

Proof of Theorem |Zi In view of Lemma 3.1 Problem 2.2 is equivalent to the following 
variational problem: 

extremize { J(S) : e^Scj = (Xjj, (i,j)GX}, (14) 
where, as before, J(S(x)) = log |det The corresponding Lagrangian is 

= J(S) + Ai,(e7Se, - a^,) (15) 

whose unconstrained extremization is obtained by annihilating the directional derivative of C in 



any direction 5S G M"^". In view of Lemma 3.2 this yields: 



tr 



or, equivalently. 



{i,i)6X 



0, V (5S G 



(16) 



(17) 



It follows, in particular, that the inverse of any nonsingular critical point S of the Lagrangian ( 15 1 



has zeros in positions (z, j) G X^. Moreover, if we can find A°j such that the matrix Kj^j^ 

(*J)ex 

is nonsingular and 



satisfies 



s° - E ^^^^^ 

\(ij)6X 



(18) 



(19) 
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then S° is indeed a solution of Problem 12.21 



Assume now that solves Problem 2.1 Then, ^ has the form (17). Moreover, 



satisfies U9\. Hence, it solves Problem 2.2 



Conversely, let solve Problem 2.2 This is equivalent to 

D[J(S),(5S]| 



|S=S(x-) 

where A4 is the subspace of I 



(20) 



(i, j) G X , namely solves Problem 2.1 



(defined in Lemma |3.3[ ) consisting of matrices whose entries 

for all 

□ 



in position E I are zero. Thus, S(a;) ^ G A^-*-. By Lemma 3.3 [T.(x) ^ 



V. Two ILLUSTRATIVE EXAMPLES 

Example 1. The following example shows that in some pathological situations it is indeed 
possible to complete a partial matrix in such a way that the completion is symmetric and positive 
definite and has a larger number of vanishing entries than the Dempster completion. Consider 
the matrix 

\ 



Dempster's completion corresponds to 
has the following inverse: 





120 
929 


4 
929 


15 
929 


X 




4 


124 


1 


63 




929 


929 


1858 


1858 




15 
929 


1 

1858 


118 
929 


2 
929 


v 


X 


63 
1858 


2 
929 


126 
929 



-(79/58527). The associated matrix S(xd) 



/ 63 

8 



n^dV -- 

If, on the other hand, we pick x = 
following inverse: 



1 

"4 

1 1009 
"4 126 



_2_ 
"63 



\ 

2 



1 __2_ 4033 
63 504 



63 
8 



X 



V 2 -I 

= —(16/929), the associated matrix T.{xm) has the 



/ 8 1 1 \ 

8 2 

10 8 

\ 1 2 8 / 
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which has six vanishing entries. Notice that, as for the Dempster's Completion, this extension 
is symmetric and positive definite. 



Example 2. Let 



/ 5 
2 
w 

\ 1 



X 

2 
5 
2 



1 \ 

y 

2 



Notice that the specified entries of E(x) are compatible with symmetry and the Toeplitz structure. 
By extremizing det [E(a;)], we obtain seven real matrices completing S(a;) to a nonsingular 
matrix: 



S(X3) 



/5 ^ 

2 I 
1 i 



/ 5 

2 

-6 

V 1 

l\ 

1 

2 



2 

5 
2 



2 
5 
2 



S(X4) 



E(X6) 



5 
2 

(5 + yi3) 
1 



1 \ 

-6 
2 

5 / 

/5 
2 
5 

VI 

2 

5 
2 



/ 5 

2 



19 

5 



V 1 



2 

5 
2 
5 



19 1 \ 

5 ^ 



2 


5 


1 


\ 


5 


2 


19 

5 




2 


5 


2 




19 

5 


2 


5 


/ 



E(X5) 



5 
2 

/ 5 2 

2 
5 

V 1 



5 
2 

5/ 



(x/13-5) 
2 
5 
2 



-i(x/l3-5) 
2 
5 



E(X7) 



1 

1(5 + ^) 
2 



V 1 i(5 + V^) 

/ 5 2 |(5 + ^/^3) 

2 5 2 

-|(Vl3-5) 2 5 

V 1 -i(v^-5) 2 5 y 

All of these completions have inverse with zeros in positions (1,3), (2,4), (3, 1) and (4,2). 
Notice that only T.(xi), T.(x3), and S(x5) are symmetric and have a Toeplitz structure. Among 
these, only E(x3) is also positive definite (S{xs) is indeed the Dempster Completion). 'E{x2), 
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E(x4) are symmetric but do not have a Toeplitz structure and T^{xq), ^(xy) have a ToepUtz 
structure but they are not symmetric. 

We observe that all of the 5 symmetric completions are also solutions of the problem of 

/ 5 2 X l\ 



extremizing det 



2 5 2 y 
X 2 5 2 



\1 y 2 5 J 

Similarly, all of the 5 Toeplitz completions are also solutions of the problem of extremizing 

/ 5 2 ,T 1 \ 



det 



2 5 2 X 
y 2 5 2 



V 1 y 2 5 / 

This example shows that even if the constraints are compatible with symmetry or other matrix 
properties, there may exist extremizing completions that do not preserve these features. 



VI. The case of rectangular matrices 

Next, we extend the results obtained in the previous sections to the general case of possibly 
non-square matrices S(a:) G M"^*'. We assume that p > n (the case p < n can be dealt with 
in a dual fashion). As before, let X c {1, 2, ... n} x {1, 2, . . .p}, and X be the complementary 
subset. To each e X we associate the unknown Xij. Let x be the /c-dimensional vector 
obtained by stacking the one on top of the other. Define as before a partial matrix to be a 
parametric family of matrices E(a;) whose entries [E(x)]ij = aij are specified for e X, 
while = Xij for G X. Again, aij and Xij take real values. Consider the following 

matrix completion problems: 

Problem 6.1: Find full row rank completions E(x) such that the corresponding Moore-Penrose 
pseudo-inverse E(x)« satisfies [E(x)«]ij = for all e X^. 

Notice that, since E(x) is full row rank, the Moore-Penrose pseudo-inverse is also a right-inverse 
and is explicity given by 

E(x)« = E(a;)T(S(a;)S(x)T)"'. (21) 

r 1 /9" 

Problem 6.2: Find full row rank completions E(x) that extremize det (E(x)E(a:)^) 
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Remark. The form of the index in Problem |6.2| is inspired by the fact, established in Lemmata 



3.1| and 3.2[ that in the square case the variational analysis only depends on the positive part 
P(S(x)) = of Actually, only the singular values of come into play. 



The main result of this section consists in showing that Problems 6.1 and 6.2 are equivalent 



Theorem 6.3: solves Problem 6.1 if and only if it solves Problem 6.2 



We first notice that Problem 6.2 is equivalent to the following: Find a full row rank completion 
S(x) that extremizes 

\T\l/2l 



J(S(a;)) := log 



det 



Denote by D[J{T)] 52] the directional derivative of J in direction 5S G IR"^^': 

e 



(22) 



(23) 



We have 

Lemma 6.4: If S is full row rank then, for any 511 G W^^, 

D[J(S);(5S] = tr[S«5E]. (24) 

Proof: Let 

P(S) := (EE^)^/^ ?7(S) := P(E)-^S. (25) 

Observe that S = P(S)f/(S) is the generalized polar decomposition of S. In particular, f/(S) 
is a matrix whose rows are orthonormal: = /. From (21), we get the following 

representation for the Moore-Penrose pseudo-inverse of S 

T} = f/(S)^P(S)-\ (26) 
Similarly, we have := S + 6511 = P{T.s)U(T,i;). Consider now the Taylor expansion 

P(E,) =P(S)+£(5P + o(£2), (27) 
where 6P := D[P(S); 5T.]. In view of ( [25] ), the latter directional derivative may be expressed as 

6P = P'[P(S); = P)[St/^(S); = (5S[/^(S) + ED[U^ (E); (28) 



From ( [TT] ), ( [28] ) and ( |26| ), using the cyclic property of the trace, we now get: 

P'[J(S);(5S] = tr [P(S)-i(5P] 

= tr [P(S)-i(5Sf/T(S)] + tr [P{J:)-^^D[U^ {^); 
= tr [Sta] +tr[f/(S)P)[f/^(S);5S]]. 



(29) 
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The result now follows by observing that tr [?7(S)L'[f/^(S); = as in (fTsl). 



□ 



The following result is a simple generalization of Lemma 3.3 



Lemma 6.5: Consider the space IR"^^ endowed with the inner product (Mi, M2) := tr [M^ M2\ ■ 
Let Ai be the subspace of W^-'p consisting of the matrices whose entries in position (z, j) G X 
are zero. Let M e M^. Then, [M]ij = for all e I. 



Proof of Theorem \6.3\ Problem |6.2| is equivalent to the following variational problem 

extremize {^(S) : ejj^ej = cTj-,-, (i, j) G X} , 



where J is given by (22 1. The corresponding Lagrangian is 



(30) 



(31) 



£(E) = J(S)+ A.,(e7Se,-a,,) 
whose unconstrained extremization is obtained by annihilating the directional derivative of £ in 



any direction 5S. In view of Lemma 6.4 we get : 



tr 



0, V (5S G 



nnxp 



(32) 



(33) 



- E ^^^^^^ I 

or, equivalently, 

(i,j)6X 

It follows, in particular, that the Moore-Penrose pseudo-inverse of any full row-rank critical point 
S of the Lagrangian ( jlsj ) has zeros in positions G X^. Moreover, if we can find X°j such 
that the matrix Kj^j^J column-rank and 



{ij)GX 




satisfies 



(34) 



(35) 



then S° is indeed a solution of Problem 16.21 



Assume now that S(x) solves Problem 6.1 Then, [S(a;)] is full row-rank and [S(x)]'' has the 



form (33). Moreover, Sfx) satisfies (35). Hence, it solves Problem 6.2 
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Conversely, let solve Problem 6.2 This is equivalent to 

Z}[J(S),5S]|^^^^_^ =tr[S(:r)«5S] = = 0, V 5S G A^, (36) 



where is the subspace of W^^^ (defined in Lemma 6.5) consisting of matrices whose entries 
in position E X are zero. Thus, [S(x)'^]^ G Ai^. By Lemma 



6.5 



G X , namely solves Problem 6.1 



for all 

□ 



We outline again the significance of the above result for the solution of systems of linear 
equations. Consider 

XS = B, (37) 

where B is given. Suppose that only certain elements of S could be estimated/determined. If 
the completion S has full row rank, we can associate to it the solution 



Xf^ = BEK 



(38) 



Again we identify as desirable, according to the principle of parsimony, completions S such that 
has a maximum number of zero entries. 



VIL Discussion 

The nuclear norm (sum of singular values) of a matrix is often used in convex heuristics for 
rank minimization problems in control, signal processing, and statistics. It has been employed in 
a series of recent papers on matrix completion, see flU, [fTOll . [|23l . [|27ll and references therein. 
The renewed interest in this metric has both theoretical and practical reasons as argued in the 
above mentioned papers. Variational problems involving the sum of the logarithm of the singular 
values (the logarithm of the determinant in the covariance case), such as those presented in this 
paper, occupy a somewhat complementary place. Indeed, as we have shown above, they lead to 
constraints on the (pseudo-)inverse of S. Moreover, in the case when S is a covariance matrix, 
Dempster's Completion S° maximizes entropy, namely the sum of the logarithm of the singular 
values (eigenvalues), whereas in [|9l, ifTOl . [l23l . ETl the sum of the singular values is minimized. 

Our variational problems appear close in spirit to Janes [|2T]| . [|22ll . where, following in the 
footsteps of Boltzmann (1877), Schrodinger [30], Cramer liT2l . Sanov [|29l , etc., and followed by 
such coryphaei as Dempster himself [|T4,1. Akaike [[0, Burg S, jH, etc., he promoted maximum 
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entropy methods to general inference methods. It might actually be worthwhile to quote an 
illuminating passage from the introduction of [|22l which deals with spectral analysis: "There are 
many different spectral analysis problems, corresponding to different kinds of prior information 
about the phenomenon being observed, different kinds of data, different kinds of perturbing noise, 
and different objectives. It is, therefore, quite meaningless to pass judgment on the merits of any 
proposed method unless one specifies clearly: "In what class of problems is this method intended 
to be used?" Most of the current confusion on these questions is, in the writer's opinion, the 
direct result of failure to define the problem explicitly enough." We feel that these considerations 
apply equally well to the matrix completion problem. 

In this paper, we have chosen to discuss a very general completion problem where no further 
requirement is imposed on the solution matrix. As soon as the solution is required to feature 
some properties, such as being positive definite, (with the possible additional constraints of being 
Toeplitz, circulant,etc.) the existence of matrices having the prescribed elements and properties 
becomes an issue. When existence is guaranteed, it should be apparent that our variational 
analysis can be readily adapted to these more structured problems. 

We finally want to observe that completing a matrix so that it enjoys certain properties [|T4l. jH, 
USD, [HI, m, [HI, W^, Wi, Wi, Wl, [HD may be viewed as a generalized 

moment problem. These are problems where a function (a measure, a matrix, etc.) is sought 
satisfying certain given moment constraints as in the classical moment problem [0, [l24|. but also 
enjoying further properties: These may take several different forms. We mention the important 
case of bounds on the complexity, such as a bound on the degree of the rational solution, for 
applications in communications and control engineering, cf. e.g. [[T6l . [[6|[, [[3, [[H, [[TTl . [[TSl . 
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